21 research outputs found

    LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

    Full text link
    Recent developments in speech synthesis have produced systems capable of outcome intelligible speech, but now researchers strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint. Despite such progress, its quality has not yet reached the level of the predominant unit-selection approaches that choose and concatenate recordings of real speech. Recent efforts have been made in the direction of improving these systems. In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those of natural speech. The results show how HMM-voices could be improved using this approach.Comment: 5 pages, 5 figure

    Post-training discriminative pruning for RBMs

    Get PDF
    One of the major challenges in the area of artificial neural networks is the identification of a suitable architecture for a specific problem. Choosing an unsuitable topology can exponentially increase the training cost, and even hinder network convergence. On the other hand, recent research indicates that larger or deeper nets can map the problem features into a more appropriate space, and thereby improve the classification process, thus leading to an apparent dichotomy. In this regard, it is interesting to inquire whether independent measures, such as mutual information, could provide a clue to finding the most discriminative neurons in a network. In the present work we explore this question in the context of Restricted Boltzmann Machines, by employing different measures to realize post-training pruning. The neurons which are determined by each measure to be the most discriminative, are combined and a classifier is applied to the ensuing network to determine its usefulness. We find that two measures in particular seem to be good indicators of the most discriminative neurons, producing savings of generally more than 50% of the neurons, while maintaining an acceptable error rate. Further, it is borne out that starting with a larger network architecture and then pruning is more advantageous than using a smaller network to begin with. Finally, a quantitative index is introduced which can provide information on choosing a suitable pruned network.Fil: Sánchez Gutiérrez, Máximo. Universidad Autónoma Metropolitana; MéxicoFil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Nacional de Entre Ríos; ArgentinaFil: Close, John Goddard. Universidad Autónoma Metropolitana; Méxic

    Un algoritmo para el entrenamiento de máquinas de vector soporte para regresión

    Get PDF
    The aim of the present paper is twofold. Firstly an introduction to the ideas of Support Vector regression is given. then a new and simple algorithm, suggested by the work of Campbell y Cristianini in [16], is proposed which solves the corresponding quadratic programming problem in an easy fashion. The algorithm is illustrated by example and compared with classical regression.Keywords: Support vector Machines, E-Support Vector regression.El propósito del presente artículo es doble. Primero se proporciona una introducción a las ideas básicas de la Máquinas de Vector Soporte para regresión. Posteriormente, se presenta un algoritmo novedoso y sencillo, basado en el trabajo de Campbell y Cristianini [16], que resuelve de manera fácil el correspondiente problema de programación cuadrática. Se ilustra el algoritmo con ejemplos, y se compara con el método de regresión clásico.Palabras clave: Máquinas de Vector Soporte, regresión , E-S

    Un problema de localización de plantas de gran escala

    Get PDF
    En este artículo se desarrolla un algoritmo heurístico y su correspondiente implementación para resolver un problema de localización de plantas (facility location) de gran escala, en donde surgen potencialmente más de 640 plantas a localizar a lo largo de la República Mexicana. Originalmente se trató de obtener solución exacta al problema, usando dos técnicas clásicas: descomposición de Benders y ramificación y acotamiento. Ambas técnicas resultan adecuadas y eficientes para resolver problemas de tamaño chico, pero las implantaciones en computadora para este problema no convergieron después de muchas horas de proceso. Se requería obtener una solución al problema mediante alguna técnica que quizá no diera la solución exacta, pero sí una solución de buena calidad. Para la solución de este problema real, se empleó la técnica de recocido simulado (simulated annealing) con excelentes resultados.Palabras clave: facility location, simulated annealing, heuristic

    HMM-Based Speech Synthesis Enhancement with Hybrid Postfilters

    No full text
    In this chapter, we introduce hybrid postfilters into speech synthesis, with the objective of enhancing the quality of the synthesized speech. Our approach combines a Wiener filter with deep neural networks. Several attempts to enhance synthetic speech have contemplated single-stage deep-learning-based postfilters, which learn to perform a mapping of the synthetic speech parameters to the natural ones. In the synthetic speech produced by statistical methods, we have measured low-level noise components, so the common single-stage postfilters must achieve the reduction of that component, as well as the complex relationship between the parameters of the synthetic and the natural speech. That is why we consider a two-stage approach: In the first stage, the Wiener filter deals with the noise components of the synthetic speech. In the second stage, a set of multi-stream postfilters, which encompass a collection of autoencoders and auto-associative networks, deal with the relationship between the output of the Wiener filter and the natural speech. Results show that the hybrid approach succeeds in enhancing the synthetic speech in most cases compared to a single-stage approach.UCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    Speech synthesis based on Hidden Markov Models and deep learning

    No full text
    Speech synthesis based on Hidden Markov Models (HMM) and other statistical parametric techniques have been a hot topic for some time. Using this techniques, speech synthesizers are able to produce intelligible and flexible voices. Despite progress, the quality of the voices produced using statistical parametric synthesis has not yet reached the level of the current predominant unit-selection approaches, that select and concatenate recordings of real speech. Researchers now strive to create models that more accurately mimic human voices. In this paper, we present our proposal to incorporate recent deep learning algorithms, specially the use of Long Short-term Memory (LSTM) to improve the quality of HMM-based speech synthesis. Thus far, the results indicate that HMM-voices can be improved using this approach in its spectral characteristics, but additional research should be conducted to improve other parameters of the voice signal, such as energy and fundamental frequency, to obtain more natural sounding voices.Universidad de Costa Rica/[]/UCR/Costa RicaConsejo Nacional de Ciencia y Tecnología/[CB-2012-01, No.182432]/CONACyT/MéxicoUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    LSTM deep neural networks postfiltering for enhancing synthetic voices

    No full text
    Recent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a similar desire to obtain characteristics which are closer to those of natural speech. The paper analyzes four types of postfilters obtained using five voices, which range from a single postfilter to enhance all the parameters, to a multi-stream proposal which separately enhances groups of parameters. The different proposals are evaluated using three objective measures and are statistically compared to determine any significance between them. The results described in the paper indicate that HMM-based voices can be enhanced using this approach, specially for the multi-stream postfilters on the considered objective measures.Universidad de Costa Rica/[]/UCR/Costa RicaConsejo Nacional de Ciencia y Tecnología/[CB-2012-01, No.182432]/CONACyT/MéxicoUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    Improving automatic speech recognition containing additive noise using deep denoising autoencoders of lstm networks

    No full text
    Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811).Automatic speech recognition systems (ASR) suffer from performance degradation under noisy conditions. Recent work, using deep neural networks to denoise spectral input features for robust ASR, have proved to be successful. In particular, Long Short-Term Memory (LSTM) autoencoders have outperformed other state of the art denoising systems when applied to the mfcc’s of a speech signal. In this paper we also consider denoising LSTM autoencoders (DLSTMA), but instead use three different DLSTMAs and apply each to the mfcc’s, fundamental frequency, and energy features, respectively. Results are given using several kinds of additive noise at different intensity levels, and show how this collection of DLSTMA’s improves the performance of the ASR in comparison with the LSTM autoencoder.Universidad de Costa Rica/[]/UCR/Costa RicaConsejo Nacional de Ciencia y Tecnología/[CB-2012-01, No.182432]/CONACyT/MéxicoUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    Producción artificial de habla: Síntesis de voz

    No full text
    A finales de la década de los 80, el estadounidense Raymond Kurzweil, inventor del primer sistema comercial que es capaz de tomar texto de un libro y pronunciarlo mediante habla artificial, escribió la obra “La era de las máquinas inteligentes”. En esta realiza una serie de predicciones sobre lo que esperaba del desarrollo tecnológico en las siguientes décadas. Entre sus pronósticos se encuentra que para principios de la primer década del año 2000, iba a ser posible sostener conversaciones con otras personas que hablaran un idioma distinto por medio del teléfono, el cual tendría la capacidad de traducir automáticamente y pronunciar el mensaje en el otro idioma.UCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric

    Hybrid speech enhancement with wiener filters and deep LSTM denoising autoencoders

    No full text
    Over the past several decades, numerous speech enhancement techniques have been proposed to improve the performance of modern communication devices in noisy environments. Among them, there is a large range of classical algorithms (e.g. spectral subtraction, Wiener filtering and Bayesian-based enhancement), and more recently several deep neural network-based. In this paper, we propose a hybrid approach to speech enhancement which combines two stages: In the first stage, the well-known Wiener filter performs the task of enhancing noisy speech. In the second stage, a refinement is performed using a new multi-stream approach, which involves a collection of denoising autoencoders and auto-associative memories based on Long Short-term Memory (LSTM) networks. We carry out a comparative performance analysis using two objective measures, using artificial noise added at different signal-to-noise levels. Results show that this hybrid system improves the signal's enhancement significantly in comparison to the Wiener filtering and the LSTM networks separately.UCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ingeniería Eléctric
    corecore